Search CORE

133 research outputs found

Non-linear Attributed Graph Clustering by Symmetric NMF with PU Learning

Author: Maekawa Seiji
Onizuka Makoto
Takeuch Koh
Publication venue
Publication date: 21/09/2018
Field of study

We consider the clustering problem of attributed graphs. Our challenge is how we can design an effective and efficient clustering method that precisely captures the hidden relationship between the topology and the attributes in real-world graphs. We propose Non-linear Attributed Graph Clustering by Symmetric Non-negative Matrix Factorization with Positive Unlabeled Learning. The features of our method are three holds. 1) it learns a non-linear projection function between the different cluster assignments of the topology and the attributes of graphs so as to capture the complicated relationship between the topology and the attributes in real-world graphs, 2) it leverages the positive unlabeled learning to take the effect of partially observed positive edges into the cluster assignment, and 3) it achieves efficient computational complexity,

O((n^2+mn)kt)

, where

n

is the vertex size,

m

is the attribute size,

k

is the number of clusters, and

t

is the number of iterations for learning the cluster assignment. We conducted experiments extensively for various clustering methods with various real datasets to validate that our method outperforms the former clustering methods regarding the clustering quality

arXiv.org e-Print Archive

Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spaces

Author: Amagata Daichi
Hara Takahiro
Onizuka Makoto
Publication venue: Springer Science and Business Media Deutschland GmbH
Publication date
Field of study

In many fields, e.g., data mining and machine learning, distance-based outlier detection (DOD) is widely employed to remove noises and find abnormal phenomena, because DOD is unsupervised, can be employed in any metric spaces, and does not have any assumptions of data distributions. Nowadays, data mining and machine learning applications face the challenge of dealing with large datasets, which requires efficient DOD algorithms. We address the DOD problem with two different definitions. Our new idea, which solves the problems, is to exploit an in-memory proximity graph. For each problem, we propose a new algorithm that exploits a proximity graph and analyze an appropriate type of proximity graph for the algorithm. Our empirical study using real datasets confirms that our DOD algorithms are significantly faster than state-of-the-art ones.Amagata D., Onizuka M., Hara T.. Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spaces. VLDB Journal 31, 797 (2022); https://doi.org/10.1007/s00778-022-00729-1

Osaka University Knowledge Archive

Fast and Exact Outlier Detection in Metric Spaces: A Proximity Graph-based Approach

Author: Amagata Daichi
Hara Takahiro
Onizuka Makoto
Publication venue: Association for Computing Machinery
Publication date
Field of study

Distance-based outlier detection is widely adopted in many fields, e.g., data mining and machine learning, because it is unsupervised, can be employed in a generic metric space, and does not have any assumptions of data distributions. Data mining and machine learning applications face a challenge of dealing with large datasets, which requires efficient distance-based outlier detection algorithms. Due to the popularization of computational environments with large memory, it is possible to build a main-memory index and detect outliers based on it, which is a promising solution for fast distance-based outlier detection. Motivated by this observation, we propose a novel approach that exploits a proximity graph. Our approach can employ an arbitrary proximity graph and obtains a significant speed-up against state-of-the-art. However, designing an effective proximity graph raises a challenge, because existing proximity graphs do not consider efficient traversal for distance-based outlier detection. To overcome this challenge, we propose a novel proximity graph, MRPG. Our empirical study using real datasets demonstrates that MRPG detects outliers significantly faster than the state-of-the-art algorithms

Osaka University Knowledge Archive

Scaling Manifold Ranking Based Image Retrieval

Author: Fujiwara Yasuhiro
Irie Go
Kuroyama Shari
Onizuka Makoto
Publication venue: 'VLDB Endowment'
Publication date: 01/12/2014
Field of study

Manifold Ranking is a graph-based ranking algorithm being successfully applied to retrieve images from multimedia databases. Given a query image, Manifold Ranking computes the ranking scores of images in the database by exploiting the relationships among them expressed in the form of a graph. Since Manifold Ranking effectively utilizes the global structure of the graph, it is significantly better at finding intuitive results compared with current approaches. Fundamentally, Manifold Ranking requires an inverse matrix to compute ranking scores and so needs O(n^3) time, where n is the number of images. Manifold Ranking, unfortunately, does not scale to support databases with large numbers of images. Our solution, Mogul, is based on two ideas: (1) It efficiently computes ranking scores by sparse matrices, and (2) It skips unnecessary score computations by estimating upper bounding scores. These two ideas reduce the time complexity of Mogul to O(n) from O(n^3) of the inverse matrix approach. Experiments show that Mogul is much faster and gives significantly better retrieval quality than a state-of-the-art approximation approach

Caltech Authors

Beyond Real-world Benchmark Datasets: An Empirical Study of Node Classification with GNNs

Author: Maekawa Seiji
Noda Koki
Onizuka Makoto
Sasaki Yuya
Publication venue
Publication date: 28/12/2022
Field of study

Graph Neural Networks (GNNs) have achieved great success on a node classification task. Despite the broad interest in developing and evaluating GNNs, they have been assessed with limited benchmark datasets. As a result, the existing evaluation of GNNs lacks fine-grained analysis from various characteristics of graphs. Motivated by this, we conduct extensive experiments with a synthetic graph generator that can generate graphs having controlled characteristics for fine-grained analysis. Our empirical studies clarify the strengths and weaknesses of GNNs from four major characteristics of real-world graphs with class labels of nodes, i.e., 1) class size distributions (balanced vs. imbalanced), 2) edge connection proportions between classes (homophilic vs. heterophilic), 3) attribute values (biased vs. random), and 4) graph sizes (small vs. large). In addition, to foster future research on GNNs, we publicly release our codebase that allows users to evaluate various GNNs with various graphs. We hope this work offers interesting insights for future research.Comment: Accepted to NeurIPS 2022 Datasets and Benchmarks Track. 21 pages, 15 figure

arXiv.org e-Print Archive

Scardina: Scalable Join Cardinality Estimation by Multiple Density Estimators

Author: Ito Ryuichi
Onizuka Makoto
Sasaki Yuya
Xiao Chuan
Publication venue
Publication date: 31/03/2023
Field of study

In recent years, machine learning-based cardinality estimation methods are replacing traditional methods. This change is expected to contribute to one of the most important applications of cardinality estimation, the query optimizer, to speed up query processing. However, none of the existing methods do not precisely estimate cardinalities when relational schemas consist of many tables with strong correlations between tables/attributes. This paper describes that multiple density estimators can be combined to effectively target the cardinality estimation of data with large and complex schemas having strong correlations. We propose Scardina, a new join cardinality estimation method using multiple partitioned models based on the schema structure

arXiv.org e-Print Archive